Automatic Reclustering of Objects in Very Large Databases for High Energy Physics

نویسندگان

  • Koen Holtman
  • Peter van der Stok
  • Ian Willers
چکیده

In the very large object database systems planned for some future particle physics experiments, typical physics analysis jobs will traverse millions of read-only objects, many more objects than fit in the database cache. Thus, a good clustering of objects on disk is highly critical to database performance. We present the implementation and performance measurements of a prototype reclustering mechanism which was developed to optimise I/O performance under the changing access patterns in a high energy physics database. Reclustering is done automatically and on-line. The methods used by our prototype differ greatly from those commonly found in proposed general-purpose reclustering systems. By exploiting some special characteristics of the access patterns of physics analysis jobs, the prototype manages to keep database I/O throughput close to the optimum throughput of raw sequential disk access.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reclustering of High Energy Physics Data

The coming high-energy physics experiments will store Petabytes of data into object databases. Analysis jobs will frequently traverse collections containing millions of stored objects. Clustering is one of the most effective means to enhance the performance of these applications. This paper presents a reclustering algorithm for independent objects contained in multiple possibly overlapping coll...

متن کامل

Reclustering of HEP Data in Object-Oriented Databases

The Large Hadron Collider (LHC), build at CERN, will enter operation in 2005. The experiments at the LHC will generate some 5 PB of data per year, which are stored in an ODBMS. A good object clustering on the disk drives will be critical to achieve a high data throughput required by future analysis scenarios. This paper presents a new reclustering algorithm for HEP data that maximizes the read ...

متن کامل

Kohonen Self Organizing for Automatic Identification of Cartographic Objects

Automatic identification and localization of cartographic objects in aerial and satellite images have gained increasing attention in recent years in digital photogrammetry and remote sensing. Although the automatic extraction of man made objects in essence is still an unresolved issue, the man made objects can be extracted from aerial photos and satellite images. Recently, the high-resolution s...

متن کامل

Data clustering research in CMS

The clustering of objects in an object database is the mapping of objects to locations on physical storage media like disk farms and tapes. The performance of the database, and the physics application on top of it, depends crucially on having a good match between the object clustering and the database access patterns of the physics application. We discuss the results and conclusions of a 3-year...

متن کامل

Clustering and Reclustering HEP Data in Object Databases

As part of the CMS contribution to the RD45 [1] collaboration, database clustering and reclustering have been under investigation for about 1.5 years. The clustering of objects in an object database is the mapping of objects to locations on physical storage media like disk farms and tapes. The performance of the database, and the physics application on top of it, depends crucially on having a g...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998